Syntax analysis method and apparatus

ABSTRACT

The present invention provides a high-precision syntactic structure analysis method to contribute to promotion of precise language processing technique. A monolingual document and a document translated from the monolingual document are input. If a plurality of analysis results occurs and is difficult to identify in the syntactic structure analysis in the monolingual document, such as a dependency structure analysis, a dependency structure is examined in the translation document, and an optimum dependency structure analysis is performed based on the examination result.

TECHNICAL FIELD

The present invention relates to a technique for heightening precisionof syntactic structure analysis in language processing and, morespecifically, to a technique for heightening precision of the syntacticstructure analysis by inputting a plurality of languages.

BACKGROUND ART

The development of techniques for parsing or generating a text of alanguage with a computer has been well in advance. A machine translationand a summarizing system, based on such techniques, are provided.

A syntactic structure analysis technique for analyzing a dependencystructure in a sentence is very important in understanding a precisecontext, and studies have been made to develop high-precision parsingtechnique.

When a language ambiguous in dependency with words frequently omitted,such as Japanese language, is analyzed, a plurality of analysis resultsare possible. It is not rare that the analysis result becomes uncertain.A word typically has a plurality meanings, and if one language isanalyzed, it is frequently uncertain what meaning the word is used at.

In a known syntactic structure analysis, a great deal of grammaticalinformation is provided in connection with a language to be parsed in anattempt to heighten analysis precision. However, such a technique merelyallows a more appropriate meaning to be selected in probability, anddoes not necessarily lead to a correct analysis result.

DISCLOSURE OF INVENTION

It is an object of the present invention to provide high-precisionsyntactic structure analysis method to contribute to promotion ofprecise language process technique. To this end, the following parsingmethod and parsing apparatus are provided.

The syntactic structure analysis method of the present invention allowsa higher precision syntactic structure analysis to be performed byinputting not only one language text to be parsed, as input in a knownsyntactic structure analysis method, but also a translation text of alanguage different from the original text.

More specifically, the following technique is used. An original text tobe parsed and at least one translation text, at least a portion of whichis translation relation to the original text, are input.

The original text and the translation text are thus parsed. Allsentences are not necessarily parsed. The original text is parsed whilethe translation text is parsed as necessary.

If at least two pieces of syntactic structure analysis information areobtained from the original text, in other words, if the syntacticstructure analysis of the original text results in a plurality of piecesof the analysis information and it is difficult to determine optimumanalysis information, the syntactic structure analysis result of thetranslation text is used.

If a plurality of translation texts are available, information oftranslation text providing the most likely analysis information is usedto identify an optimum result of the original text from the plurality ofpieces of syntactic structure analysis information of the original text.

The identified result is output as the syntactic structure analysisresult appropriate for the original text. Syntactic structure analysisthat has been difficult in the conventional one language system providesa high-precision analysis result.

If the syntactic structure analysis information having at least twopieces of word meaning information is obtained from the original text,the ambiguity of word meaning is solved by acquiring the syntacticstructure analysis information from the word meaning information of anytranslation text. Based on a fixed word meaning, syntactic structureanalysis may be performed on the original text.

The syntactic structure analysis method of the present invention may beintroduced in a process of generating a third language in response tothe input of a plurality of languages. It is known that when a thirdlanguage is generated from a given language, a more precise result isprovided by the use of a plurality of languages than the use of a singlelanguage only.

The present invention provides a language processing parsing apparatus.

The parsing apparatus includes original text input means for inputtingan original text to be parsed, and translation text input means forinputting a translation text, at least a portion of which is intranslation relation to the original text, with a translation relationbeing associated therebetween.

Morphological analysis means morphologically analyzes the input originaltext and the input translation text.

Parsing means parses the morphologically analyzed result, bysyntactically analyzing all morphemes of the original text and at leastrequired morphemes of the translation text.

The parsing apparatus includes optimum result identification means foridentifying the optimum syntactic structure analysis result of theoriginal text by referencing the syntactic structure analysis result ofthe translation text if a plurality of pieces of syntactic structureanalysis information is acquired from the original text or one of theplurality of pieces of syntactic structure analysis result fails toexceed a predetermined likelihood.

The parsing apparatus outputs an optimum result through syntacticstructure analysis result output means.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a flowchart for converting a monolingual document to a targetlanguage text and generating the target language document in a knowntechnique.

FIG. 2 is a flowchart of a translation system that appropriatelyincorporates a parsing apparatus of the present invention.

FIG. 3 illustrates a configuration of the parsing apparatus of thepresent invention.

Reference numerals designate the following elements: 20 a: monolingualdocument, 20 b: translation document, 21: parsing apparatus of thepresent invention, 30: CPU, 31: reader, 32: external storage device, 33:ROM and RAM, 34: morphological analysis step, 35: dependency analysisstep, 36: case analysis step, 37: translation document searching step,and 38: translation document dependency structure analysis step.

BEST MODE FOR CARRYING OUT THE INVENTION

The embodiments of the present invention will now be discussed withreference to the drawings.

The present invention provides a technique to perform a syntacticstructure analysis at a precise level that is considered difficult usinga conventional syntactic structure analysis technique. Morespecifically, the present invention provides an extremely high-precisionsyntactic structure analysis technique using a plurality ofhigh-precision languages translated by human beings, for example,Japanese language and English language.

In one application example, the present invention is incorporated in atranslation system, in which an original language document to be parsedand a language document translated from the original language are inputto generate a target language.

FIG. 1 is a flowchart for converting a monolingual document to a targetlanguage text and generating the target language document in a knowntechnique. FIG. 2 is a flowchart of converting Japanese language andEnglish language to a target language to generate the target language inaccordance with the present invention.

A known translation process of translating a monolingual document (10)to a target language document (14) is typically performed by a syntaxanalyzer (11), a converter (12), and a generator (13) as major elements.The development of the syntax analyzer (11), the converter (12) and thegenerator (13) essentially requires a manual production of rule (15). Agreat deal of document must be analyzed to develop a high-precisionsystem. For example, large costs and a vast amount of studies arerequired to develop a large scale corpus for use in learning. Suchcorpuses are currently being produced for major languages, but hopes arelow that corpuses are produced for non-major languages.

FIG. 2 illustrates a translation system that precisely translates to atarget language using a monolingual document (20 a), one of the majorlanguage with the corpus thereof organized, and a translation languagedocument (20 b) that is a parallel correspondence of the monolingualdocument (20 a).

In the system, input means (not shown) for inputting at least twotranslation texts inputs documents. The translation texts in each of thelanguages or in any combination of the languages thus reach a parser(21) of the present invention as analyzing means for analyzing languageinformation.

The parsing apparatus includes a converter (22) as converting means forconverting the language to a third language in response to an analysisresult of the parser (21), and a generator (23) as generating means forgenerating a text of the third language in response to the conversionresult of a converting step. The converter (22) and the generator (23)contain knowledge (25) for conversion and linguistic knowledge (26) forgenerator, respectively.

Finally, the generator (23) outputs the target language document (24).

Input language documents are a Japanese language document and an Englishlanguage document with one translated from the other. In this case, onedocument may be a full or a partial translation of the other entiredocument. The number of input languages is at least two, and ahigh-precision syntactic structure analysis is performed on a thirdlanguage.

A combination of translation languages in the present invention may beJapanese language and English language, or Japanese language and Chineselanguage, or a third language therefrom. The use of languages indifferent language families is preferable. For example, if Englishlanguage and French language are used, the effectiveness of the presentinvention is not so large. However, if English language, Frenchlanguage, and Japanese language are combined, higher precision analysisis expected than in a combination of English language and Japaneselanguage only. Such a combination is preferable.

The parser (21) of the present invention will now be discussed indetail.

The system analyzes a dependency structure (modification relation)between words (or bunsetu or phrase in Japanese language being a largerunit than word) in response to two documents in Japanese language andEnglish language (20 a)(20 b) with one translated from the other. Thedependency structure may be determined by applying, to another language,a dependency model in Japanese language proposed by the applicant ofthis application (“kouhou bunmyaku wo kouryoshita kakariuke model”(Dependency Model Using Posterior Context), authored by K. Uchimoto, M.Murata, S. Sekine, and H. Isahara, Journal of Natural LanguageProcessing Volume 7, No. 5, pp.3-17 (2000)).

That model is used to learn whether two words (or bunsetu) are dependenton each other, and is implemented using a machine learning model. Thedependency structure is determined so that the product of probabilitiesof one entire sentence calculated in a learned model is maximized.

A case analysis (semantic analysis) is performed on the dependencystructure structure. In the processing of dependency structure, theeffectiveness of the two translation languages is measurable as thecorrect answer rate of dependency in the dependency structure increases.

FIG. 3 illustrates a configuration of the parsing apparatus of thepresent invention. The apparatus (21) includes a CPU (30), a reader(31), an external storage unit (32), and an ROM and RAM unit (33), andthe ROM and RAM unit (33) stores, as necessary, the process performed bythe CPU (30).

The result of the syntactic structure analysis is output to the ROM andRAM unit (33) for storage, and is then subjected to the process of theconverter (22).

In a morphological analysis step (34), the CPU (30) morphologicallyanalyzes an input monolingual document (here, a Japanese languagedocument) (20 a) and a translation language document (here, an Englishlanguage document) (20 b). In the morphological analysis, part ofspeech, etc. may be imparted referencing a morphological analysisdictionary stored in the external storage unit (32).

The dependency structure between words in the Japanese language document(20 a) is analyzed based on the result of the morphological analysis.(Dependency relation analysis step 35).

If the dependency structure analysis step 35 results in one analysisresult, or if the analysis result shows a likelihood equal to or higherthan a predetermined threshold in the machine learning, the caseanalysis is performed in a case analysis step (36). The result of thecase analysis step (36) is stored in the external storage unit (32).

Generally speaking, it is difficult to determine a precise dependencystructure in response to the mere input of the monolingual document. Inthe dependency structure analysis step (35), particularly importantinformation is word order. For example, if a Japanese sentence “watashiwa (I) shojo (girl) to inu (dog) wo mita (saw).” may be interpreted asstating “‘watashi’ ga ‘shojo to inu wo mita’” (I saw a girl and a dog.)or “‘watashi’ ga ‘shojo’ to tomoni ‘inu wo mita’” (I and a girl saw adog).

In accordance with the present invention, a translation portion of theEnglish document is analyzed to determine which analysis result iscorrect.

If a plurality of analysis results are obtained in the dependencystructure analysis step (35), and it is impossible to determine whichanalysis result is appropriate, the algorithm proceeds to a translationsearching step (37) to search for a portion of the English document (20b) corresponding to the sentence in question of the Japanese document(20 a).

In the translation searching step (37), a known language processingtechnique for extracting a mutual relationship between two texts may beused. For example, a translation sentence association apparatusdisclosed in Japanese Patent 3311567 may be used.

When the translation sentence is found in the search, a dependencystructure in the sentence is analyzed. (Translation document dependencystructure analysis step (38)).

Referring to a translation sentence found in the search “I saw a girland a dog.” in the above example, the former interpretation “‘watashi’ga ‘shojo to inu wo mita’” is easily determined to be appropriate. Inthe case of the latter analysis result “‘watashi’ ga ‘shojo’ to tomoni‘inu wo mita’”, the corresponding translation sentence must be in theorder “I and a girl saw a dog”, which fails to be consistent with thesentence found in the search.

The precise dependency structure analysis, which has been conventionallydifficult, is now possible by feeding back the information concerningthe dependency structure in the translation document to the dependencystructure analysis step (35).

Japanese sentences are substantially different from English sentences inword order, and English grammatical restrictions on word order arestrict. A modification destination, which is ambiguous in Japanesesentences, is clarified in English, and vice versa.

In the case of the translation sentence “I saw a girl and a dog./watashiwa shojo to inu wo mita.” in the above example, the phrase “and a dog”is clearly dependent on the word “saw” in English. However, in theJapanese sentence, it is ambiguous as to whether “shojo to” modifies“inu wo” as a parallel phrase thereof or “mita”.

Conversely, in the case of a translation sentence is “I saw a girl witha telescope./watashi wa bouenkyou de shojo wo mita.”, the Englishsentence is ambiguous as to whether “with a telescope” is dependent on“saw” or “a girl”. In the Japanese sentence, analysis easily concludesthat “bouenkyou de” modifies “mita”.

The latter example shows that the input of a Japanese translationdocument is effective when an English document is input as a monolingualdocument.

In addition to word order, grammatical information may be effectivelyused. For example, the grammatical information includes article,singular or plural forms of a noun, conjugation information of a verbincluding gerund and infinitive in English language, and information ofa postpositional word in Japanese language.

For example, a Japanese language sentence “kare (he) wa hon wo kaki(write), shuppanshiteiru (publish) hito (people) wo sonkeishiteiru(respect).” is ambiguous as to whether “‘hon wo’ kaiteiru” (people whowrite a book) is “kare”(he) or “shuppanshiteiru hito” (people whopublish).

If a translation sentence “He respects people who write books andpublish them.” is input, it is grammatically clear that verbs after“who” are dependent on “people” (because the verbs do not end with “s”that is used in the third-person, present-tense, singular formsthereof). An analysis thus correctly shows that “hon wo kaiteiru”(people who write books) is “shuppanshiteiru hito” (people who publish).

Information as to whether there is an omitted word is also used. InJapanese language documents, a subject is frequently omitted (zeropronouns are frequently used). In English documents, a subject isessential in many cases, and an ambiguous portion with a subject omittedis compensated for by English document.

This technique is effective when a subject must be identified using acase analysis.

For example, Japanese sentences reading “tomodachi (friend) to resutoran(restaurant) e ikimashita (went). yumeijin (celebrity) ni aete (met)rakii (lucky) deshita.” are ambiguous as to who is lucky, I or thefriend, or both. The Japanese sentences are also ambiguous as to whethera single celebrity or a plurality of celebrities were there. An Englishtranslation of the Japanese sentences “I went to the restaurant with myfriend. We were lucky because we met a celebrity.” clearly conveys thatboth were lucky and that they met one celebrity.

The ambiguity of a word meaning may be solved in a translation, and theambiguity in the syntactic dependency may be solved. An English sentenceas an original language, and a Japanese language as a translation may beinput.

For example, an English sentence reading “He saw a girl laughing at thesecond story.” is unclear. The sentence could have three meanings, i.e.,“He saw a girl listening to and then laughing at the second story.”, “Atthe second floor, he saw a laughing girl.”, “He saw a girl who waslaughing at the second floor.” In other words, the English sentence isambiguous as to whether “at the book store” is dependent on “laughing”or “saw”.

A Japanese translation reading “kare wa nibanme no hanashi wo kiitewaratteiru shojo wo mita.” clearly conveys that story means “tale”rather than “floor”, and analysis correctly concludes that “story” isdependent on “laughing”.

From the foregoing discussion, the information of the translationcontributes to not only syntactic structure analysis but also thesolution to word meaning ambiguity. The ambiguity of word meaning of theEnglish word “bank” is considered.

The English word “bank” is ambiguous with two meanings “ginko (abusiness organization)” and “dote (land along the side of river)” whileJapanese “ginko” and “dote” have two different meanings. Such ambiguityis easily solved by examining which word is used as the word “bank” inthe Japanese sentence.

The clarification of the ambiguity of word meaning using the translationlanguage easily determines the modification destination, therebycontributing to a precise syntactic structure analysis. Based on thefixed word meaning, the syntactic structure analysis, namely, thedependency structure analysis step (35) is performed. If the dependencystructure analysis step 35 results in one analysis result, or if theanalysis result shows a likelihood equal to or higher than apredetermined threshold in the machine learning, the algorithm proceedsto the case analysis step (36).

The present invention provides a novel parsing apparatus that performsan extremely precise syntactic structure analysis by inputting thetranslation document in addition to the known technique of syntacticstructure analysis of the monolingual document.

In particular, when one language having mild word order, and anotherlanguage strict word order are available, a word order of a strict wordorder language document is analyzed. If a plurality of analysis resultsare obtained in the mild word order language, an analysis resultrecognized in the strict word order language may be adopted in thecourse of analysis. Syntactic structure analysis is thus easily andprecisely performed.

The present invention thus constructed provides the followingadvantages.

One of claims 1 through 4 provides a high-precision syntactic structureanalysis method to identify a syntactic structure analysis result fromamong a plurality of syntactic structure analysis results. It should benoted that identifying one from a plurality of analysis results has beenconventionally difficult.

If a sentence in one language such as Japanese language is open toseveral interpretations because of the mild word order rule thereof, aknown technique performs a likely interpretation based on a vast amountof accumulated knowledge. However, in accordance with the presentinvention, an appropriate interpretation is made by inputting a languagehaving strict word order rule as a translation.

The present invention allows the grammatical information other than wordorder to be effectively used. When a subject in Japanese language isambiguous, the subject is correctly identified from a singular or pluralEnglish form. Analysis precision is thus heightened.

The information concerning a word omission may be used. When a subjectmust be identified using the case analysis in a Japanese languagesentence, a conventional single language analysis alone cannot predictthe subject. In accordance with the present invention, the subject isexactly identified by referencing the English sentence. Analysisprecision is thus heightened.

It is not rare that a single word has a plurality of word meanings inone language. In the conventional syntactic structure analysis method,an erroneous analysis is sometimes performed based on an erroneous wordmeaning recognition. The present invention identifies an exact wordmeaning from a translation, and syntactic structure analysis precisionlevel is heightened.

The above method permits a precise syntactic structure analysis bysimply using translation texts often already in presence, and is muchmore easier than selecting an optimum analysis result through theintervention of human being in the course of the syntactic structureanalysis. The above method thus satisfies the requirements for theautomation of the syntactic structure analysis and language processing.

The parsing apparatus of one of claims 5 through 7 automaticallyperforms the syntactic structure analysis including the morphologicalanalysis, the dependency structure analysis, the case analysis, etc., inresponse to the input of at least two languages in translation relationto each other. For example, if a dependency structure is unknown,documents in translation relation to each other are analyzed. Anappropriate dependency structure is thus determined from the result. Thepresent invention thus provides a high-precision parsing apparatus thatcan be substituted for the conventional parsing apparatus.

The present invention may be advantageously implemented in a translationsystem that generates a third language, by inputting a plurality oflanguages in translation relation to each other.

1. A parsing method for language processing, comprising: inputtingthrough original text input means an original text to be parsed, andthrough translation text input means at least one text, at least aportion of which is in translation relation to the original text,parsing the original text and the translation text through parsing meansthat uses a machine learning model, identifying optimum syntacticstructure analysis information of the original text from the syntacticstructure analysis information of any of the translation texts usingoptimum result identification means based on the syntactic structureanalysis information of the translation text if at least two pieces ofsyntactic structure analysis information are acquired from the originaltext, and outputting the identified syntactic structure analysisinformation as the syntactic structure analysis result of the originaltext through syntactic structure analysis result output means.
 2. Aparsing method according to claim 1, wherein if the parsing means usingthe machine learning model results in at least two pieces of syntacticstructure analysis information from the original text, the optimumresult identification means acquires the syntactic structure analysisinformation based on at least one of word order information, grammaticalinformation, information regarding the presence or absence of anomission, word meaning information in any of the translation texts, andidentifies the optimum syntactic structure analysis information of theoriginal text from the syntactic structure analysis information of thetranslation text.
 3. A parsing method according to one of claim 1 or 2,wherein if the parsing means using the machine learning model results inat least two pieces of syntactic structure analysis information from theoriginal text, the parsing means using the machine learning model solvesthe ambiguity of the meaning of a word by acquiring the syntacticstructure analysis information based the word meaning information of anytranslation text, and parses the original text again based on the fixedword meaning.
 4. (canceled)
 5. A parsing apparatus for languageprocessing, comprising: original text input means for inputting anoriginal text to be parsed, translation text input means for inputting atranslation text, at least a portion of which is in translation relationto the original text, with translation relation being associatedtherebetween, morphological analysis means for morphologically analyzingthe input original text and the input translation text, parsing meansfor parsing the morphologically analyzed result using a machine learningmodel, optimum result identification means for identifying optimumsyntactic structure analysis result of the original text by referencingthe syntactic structure analysis result of the translation text if aplurality of pieces of syntactic structure analysis information isacquired from the original text or one of the plurality of pieces ofsyntactic structure analysis result fails to exceed a predeterminedlikelihood, and syntactic structure analysis result output means foroutputting the optimum result.
 6. A parsing apparatus according to claim5, wherein if at least two pieces of syntactic structure analysisinformation are obtained from the original text, the optimum resultidentification means acquires the syntactic structure analysisinformation based on at least one of word order information, grammaticalinformation, information regarding the presence or absence of anomission, word meaning information in any of the translation text, andidentifies the optimum syntactic structure analysis information of theoriginal text from the syntactic structure analysis information of thetranslation text.
 7. (canceled)